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KANTOROVICH METRIC: INITIAL HISTORY AND 
LITTLE-KNOWN APPLICATIONS 

A.VERSHIK 


Abstract. We recall the history of the transportation (Kantorovich) 
metric and the Monge-Kantorovich problem. We also describe sev¬ 
eral little-known applications: the first one concerns the theory of 
decreasing sequences of partitions (tower of measures and iterated 
metric), the second one relates to Ornstein’s theory of Bernoulli 
automorphisms (d-metric), and the third one is the formulation of 
the strong Monge-Kantorovich problem in terms of matrix distri¬ 
butions. Bibliography: 30 titles. 


1. Introduction: the first papers on the transportation 

PROBLEM 

The studies on the transportation problem could be called a true 
pearl in the extremely rich scientific legacy of L. V. Kantorovich. The 
beauty and naturalness of the formulation, the fundamental character 
of the main theorem (optimality criterion), and, finally, the wealth of 
applications (some of them are realized, but new applications keep on 
arising in areas that appear only now) - all this allows us to place these 
studies among the classic mathematical works of the 20th century. Un¬ 
doubtedly, the same words can be applied to the whole series of papers 
on linear programming (from which the transportation problem cannot 
be separated), which became the starting point for further studies on 
mathematical economics, but here we will only dwell on the remarkable 
role of what was later called the “Monge-Kantorovich problem” and 
“transportation metric.” 1 In this introduction we do not intend to 
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Yliis metric has a dozen of names known (one most used Vasserstein metric), 
because it has been rediscovered more than once and still keeps being rediscovered. 
For many years I had to explain that many metrics known in measure theory, 
ergodic theory, functional analysis, statistics, etc., introduced in the 50s-80s, are 
special cases of the general definition of Kantorovich’s transportation metric. Many 
papers and books have appeared since then (see, for example, [18]), but maybe it 
is only now (2004) that we can say that the publicity of the main facts discovered 
by L.V. and his co-authors matches their importance. 

1 
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give a survey of this huge subject; we will mention only the very first 
papers of L.V. and his co-authors. 

Apparently, L.V. conceived the formulation of the transportation 
problem soon after he defined the general model of the production 
planning problem, i.e., in the late 30s (the booklet [8]). However, if we 
judge from the date of the first publication, the transportation prob¬ 
lem was born in 1942, with the publication of the note [10], which later 
became famous. The year itself predetermined the long road this paper 
had to walk to become known to specialists. The paper contains an 
explicit formulation of the general continuous transportation problem 
on a compact metric space, the dual problem, and the optimality cri¬ 
terion. Later, in the small note [11] published in Uspekhi Mat. Nauk, 
Kantorovich established a relation to Monge’s problem of excavations 
and embankments, i.e., to the transportation problem on the Euclidean 
plane. Since then, the general Kantorovich problem is sometimes called 
the Monge-Kantorovich problem ( MK-problem for short). The next pa¬ 
per [6], joint with a pupil of L.V., M. K. Gavurin, was addressed rather 
to applied mathematicians and economists; it contained a development 
of the method of potentials (a version of the method of resolving mul¬ 
tipliers suggested by L.V. in 1939) for solving the finite-dimensional 
transportation problem. Written long before publication, it appeared 
only in 1949, and this delay was caused not by the wartime conditions, 
but by the Soviet practice of that time, when each scientific paper that 
even slightly touched economic (not to mention socio-economic) prob¬ 
lems had to go through long and absurd censorship; besides, the paper 
was published not in a journal, but in a special hard-to-reach volume. 

Till 1956, i.e., during 18 years of existence of the new mathematical 
economic theory, L.V. and his co-authors published less than 10 papers 
on this subject (I remember G. Sh. Rubinshtein making up, at my 
request, the complete list of these papers in autumn 1956). Surely, 
not because texts dealing with these problems were not written. L.V. 
had already prepared a whole book on economics, whose destiny is 
an exact and gloomy illustration of the system’s attitude to scientific 
studies that do not keep within obligatory schemes, rigid and hence 
fruitless. A revised version of the book was not published till almost 
twenty years later ([13]). 

In 1955-56, L.V. decided to “open” this topic; he began to give 
public and special lectures, to popularize his theory. The moment was 
chosen quite well. However, the wide distribution and acknowledgment 
of these studies were still a long way off. One can read about all these 
events in the book [16] (in particular, in my paper [26]), but a detailed 
account of the whole story is still to be written. 
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Let us return to the transportation problem. The third important 
paper on this subject was the paper [15] by L.V. and his pupil and 
co-author G. Sh. Rubinshtein. It is this paper that contained an ex¬ 
plicit definition of the norm in the space of measures related to the 
transportation metric. The main observation was that the conjugate 
space to the space of measures with this norm is the space of Lipschitz 
functions, and the optimality criterion is nothing more than the dual 
definition of the norm as a supremum over the sphere of the conju¬ 
gate space. Before this paper it was not known whether the space of 
Lipschitz functions is conjugate to any Banach space. At that time 
(1956-57) I was interested in mathematical economics and maintained 
close contacts with L.V. and G. Sh. Rubinshtein, and G. Sh. described 
me in detail the stages of their work; in particular, he said that L.V. 
was very satisfied by this interpretation of the transportation problem. 
After this paper, the metric is often called the Kantorovich-Rubinshtein 
metric. 

Here it is worthwhile to make two remarks. Of course, the idea of du¬ 
ality was contained from the very beginning both in the booklet of 1939 
(the method of resolving multipliers) and in the note by L.V. in Doklady 
Akad. Nauk SSSR [9] - the first paper devoted to comprehending the 
relations between functional analysis and nonclassical linear extremal 
problems (calculation of norms and extrema); it is worth noting that 
this was one more example showing the utility of functional analysis 
for applications; see the paper [14], devoted to applications of linear 
programming to computational mathematics, and the classical work by 
L.V. on the Newton method [12], On the other hand, the technique 
that consists in taking the objective function as the norm in the space 
of right-hand sides of an extremal problem (exactly this was suggested 
in [15]) can be successfully applied to many extremal problems (see, 
for example, [19, 28]). It was noted more than once that both classics 
of mathematical economics of the 20th century - von Neumann and 
Kantorovich - came from functional analysis. 

We cannot but mention that eventually, in course of development 
of the theory of nonclassical extremal problems, other relations be¬ 
came obvious: to the theory of linear inequalities and separability the¬ 
ory, Chebyshev approximations and Krein’s L-moment problem, Weyl’s 
studies on convex polytopes and convex geometry as a whole, Bour- 
baki’s theory of polars and combinatorics, etc. 2 Today we would 

2 The lecture course “Extremal problems,” which I taught for many years at 
the Department of Mathematics and Mechanics of the Leningrad State University, 
was compiled taking into account all these relations; in fact, it was a synthesis of 
functional analysis and the theory of extremal problems. The textbook based on 
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include in this list “tropical” mathematics, or max-plus algebra and 
impetuious developement of the applications to differential equations, 
in particualr to Monge-Ampere equation, hydrodynamics and so on (see 
references). We will not discuss here those illuminated applications. 


2. Basic definitions 


The transportation problem has been always holding a prominent 
position among all problems of linear programming due to its general 
formulation and methods of solution. In what follows, I would like to 
present several little-known applications of the transportation metric; 
but first let us recall the formulation of the transportation problem. 


Definition 1. Let (X,r) be a compact metric space, and let pi and 
p 2 be two probability Borel measures on X. Consider the Monge- 
Kantorovich variational problem (MK-problem for short): set 


K{p i,// 2 ) = inf 

L/ 


r(xi,x 2 ) dL , 


where L runs over all Borel measures on X x X with marginal measures 
pi and /i 2 • 

The quantity k r (p i, P 2 ) determines a metric on the simplex V (X) 
of all probability measures on the compact space X; it is called the 
Kantorovich (or transportation) metric ([10]). 


Remark. The measure L is a “plan of transportation” of the dis¬ 
tribution pi to the distribution /i 2 ; the integral means the cost of a 
given transportation plan, and the inhmum (the Kantorovich metric) 
is achieved at the optimal plan. 

Theorem 1 . (Kantorovich-Rubinshtein [15]) (1) Consider the vector 
space Vo(X) of all (not necessarily positive) Borel measures v with 
zero charge and finite variation (i.e., the positive part, u + , and the 
negative part, z/_, of v have the same finite variation) and define the 
Kantorovich-Rubinshtein norm \\v\\k of an element v G Vq{X) as the 
Kantorovich distance between the positive and negative parts of v: 

IMIfc = k r (u + ,u_). 

Then the space of Lipschitz (up to additive constant) functions with the 
Lipschitz norm is the conjugate normed space to the space V 0 (X) with 
the norm ||.la¬ 


this course was not finished, but part of material was included in the textbook [1] 
written by my pupil A. I. Barvinok. 
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(2) A plan L in (1) is optimal if and only if there exists a Lipschitz 
function U with Lipschitz constant 1 such that 

U(x) — U(y) = r(x,y) 

almost everywhere with respect to the plan L. 


We will omit the index r in the notation k r if the metric r is fixed, 
as well as the index k in the notation ||.||fc. 

Remark 1. The Kantorovich metric induces the weak topology on the 
simplex of probability measures on the compact space X ([15]). 

Remark 2. In the framework of solution of the finite-dimensional 
transportation problem, the optimal Lipschitz function U is nothing 
more than the Kantorovich-Gavurin potential from [6]. 

There is a huge number of difficult problems related to explicit cal¬ 
culation of the Kantorovich metric for a given compact space. For IR 2 , 
this is the classical Monge’s problem on transportation of sand. For 
IR 1 , there is a good answer: let v\ and v 2 be two probability measures 
on [0,1], and let r be the ordinary (Euclidean) metric; then k r {v \, u 2 ) = 
Jo MM) — u 2 ([0, t]) \ dt , i.e., the Kantorovich metric is just the L 1 - 
metric for distribution functions. Apparently, there are no explicit 
formulas for R n , n > 2. Many papers are devoted to this problem; we 
will mention only the recent surveys [29, 30, 5]. 

However, it makes sense to mention an essential idea, which has 
appeared recently and which plays a very important role in modern 
applications to hydrodynamics, differential equations, and other areas 
(see [4,5,30] and references there); I mean the p-Kantorovich norms 
(see [2]). Namely, the original definition of the Kantorovich metric 
(and Kantorovich norm) resembles the definition of the L 1 -norm; but 
we can also define an analog of the L p -norm 


v 2 ) = inf 


r(x i, x 2 ) p dL 


i /p 


where the inffinum is taken, as before, over all transportation plans L 
for a pair of probability measures (v \, u 2 ), and the corresponding norm 


Ml p = kpi^+X-) 


for all p > 1. Of course, the original Kantorovich metric (the case 
p — 1) has more physical significance, but the case p — 2 is much 
more convenient from the technical and geometric point of view. The 
corresponding variational problem and Euler equation are simpler than 
in the case p — 1, and the results of [2] show that for a certain geometric 
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transportation problem, the Euler equation is the well-known Monge- 
Arnpere equation (which a priori has nothing to do with the Monge- 
Kantorovich problem). 

Let us mention another important special case, which is sometimes 
also called the MK-problem; we will call it the strong MK-problem. 
Namely, with the above notation, it is formulated as follows: to find 

k{fi 1 ,^ 2 ) =inf J r(x,Tx) d/afix), 

where the infimum is taken over all measurable mappings T such that 
T i-i 1 — H 2 ■ 

The existence of minimum in (2) is a very subtle question. Of course, 
k(n 1 , /i 2 ) > k(p 1 , /x 2 ), and the question of when the inequality becomes 
an equality is difficult and very important. In the last section, we will 
present a new approach to both problems. 

Among a huge number of applications of the Kantorovich metric, 
I would like to mention only three examples, which are little known 
to specialists in applications of this metric, yet are very important in 
dynamical systems and functional analysis. 

3. The iterated Kantorovich metric and the tower of 

MEASURES 

We will begin with the notion of tower of measures, which was defined 
in [20] and considered in more detail in [23], [39] Let (A", r) be an arbi¬ 
trary compact metric space (say, the unit interval with the Euclidean 
metric). We can consider a new compact space V(X ), the space of all 
probability Borel measures on A", and supply it with the Kantorovich 
metric. Thus we have defined a functor F from the category of metric 
compact spaces to itself: F : X > V(X), r * k r ; it is clear that F 
sends each homeomorphism of a compact space X\ to a compact space 
A 2 to a homeomorphism of V (Ah) to V (A 2 ). 

Obviously, (X,r) can be isometrically embedded into (V(X), k r ) via 
the mapping x 1 —> 5 X . 

Let us iterate this procedure: 

(X, r) — (V(X), kr) — (V(V(X)), k kr ) .... 

Set V n = V(V n ~ l (X)) and fc” = k k n- 1 and introduce the notation F n 
for the mapping (K n_1 , fe]? -1 ) —> ( V n , fe"). 

We can consider the inductive limit of this sequence of metric spaces 
with isometric embeddings: 

(K°°, kfi) = indlim n ((K n , k^), F n ). 
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This inductive limit (a metric space) is called the infinite tower of 
measures ; it plays a crucial role in the theory of filtrations of cr-fields 
generated by random processes and its various applications. 

On the other hand, for n > 2 there is a natural projection 

Pn : V n — O n -\ P n (fJ.) = ji, 

where Jx is the barycenter of the measure /i, which is well defined for 
measures on affine compact spaces (thus the projection is defined for 
V n , n > 2), and we have the sequence 

(V 1 (X),k r ) <— (V 2 (X),kr) <—- 

Thus we obtain the projective limit 

F° = projlim n (V n (X),P n ). 

Since P n F n = I n - i, the inductive limit V°° is naturally embedded into 
the projective limit: 

c F 0O ; 

but, in contrast to the case of inductive limit, on the projective limit 
there is no natural metric. 3 

The main application of this tower of measures is as follows. Assume 
that we have a “metric triple” (X,r,ff), i.e., a measure space with a 
metric or semimetric, and a decreasing sequence of measurable parti¬ 
tions of this space (discrete filtration) {£„}, n = 0,1,...; here £ 0 is 
trivial and £ n > £ n +i- 

First consider one partition £; for almost all points a £ X/f of the 
quotient space with respect to this partition, there is a well-defined 
conditional measure on the element of £ corresponding to a. We regard 
it as a measure on (. X , r); thus we have a mapping /e : X/£ —» K (X, r), 
which sends almost every point a £ X/£ to a (conditional) measure 
on (X, r). ft is convenient to regard this mapping as a function from 
(X,/r) to V(X). 

Now define a metric (or semimetric) on X/f as follows: for almost 
all pairs of points a, b £ X/f, define the distance between them as the 
Kantorovich distance between the corresponding conditional measures. 
Thus we have defined a metric (or semimetric) on a subset of full mea¬ 
sure in the quotient space X/£; it can also be regarded as a semimetric 
on the original space (X,/x). 

Apply this process to the decreasing sequence of partitions {£ n }: 
start from £i, then define a metric on Xf , a mapping f\ : A" —> 

3 Inductive systems having projections that are the right inverses to the embed¬ 
dings can be called indo-projective systems; they appear quite often. 
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V(X,r), and a partition £ 2 /£i> now we have a mapping from X/£ 2 to 
V 2 (X ), a new metric on X /f 2 , and a map / 2 > K 2 (X, r). 

Continuing this process, we obtain mappings f n from (X, /i) to the 
iterated spaces V n (X, r), or to the inductive limit (V°°, kf°). 

One of the main results of the theory of decreasing sequences ([20], 
[23]) is the following theorem. 

Theorem 2. A decreasing homogeneous sequence of measurable parti¬ 
tions is standard (see [20, 23] for definitions) if and only if the sequence 
of measures f n * p (in other words, the sequence of the distributions of 
the mappings f n with respect to the measure p), regarded as a sequence 
of measures on the inductive limit (V°°,kf°), tends to a 5-measure. 

A discussion of these subjects can be found in [23] and in forthcoming 
papers. 

4. The Kantorovich metric in Ornstein’s theory 

In the early 70s, Donald Ornstein solved a long-standing problem 
in ergoclic theory: he gave necessary and sufficient conditions on a 
discrete-time stationary random process under which the shift in the 
space of trajectories of this process is isomorphic to a Bernoulli shift; 
using this result, he proved that the Kolmogorov entropy is a complete 
invariant of Bernoulli shifts ([17]). We will formulate the main theorem 
of Ornstein’s theory in order to illustrate the role of the Kantorovich 
metric, which was rediscovered by Ornstein (he called it the d-metric). 

Assume that the state space A of a stationary process is finite and 
p is the stationary measure on S z generated by this process. The 
question is formulated as follows: when there exists an isomorphism (in 
the measure-theoretic sense) of the Bernoulli space S' Z with product 
measure and the space (S z , p) that commutes with the shift. This is 
the well-known isomorphism problem in ergodic theory. It is clear that 
the criterion of existence of such an isomorphism must be expressed 
in terms of the rate of decrease of the correlation between the past 
and the future of the process. There are many known conditions of 
this type, which are sometimes called “mixing conditions.” Most of 
such conditions known in the theory of stationary processes are too 
strong (Kolmogorov’s, Rozenblatt’s, Ibragimov’s conditions, etc.). It 
turned out that the right notion is related to the Kantorovich metric 
on the space of words with the Hamming metric - this was discovered 
by D. Ornstein. Our interpretation slightly differs from the original 
one, but is closer to the previous context (see [29]). 

Let {£„,}, n G Z, be a stationary random process with finite state 
space S and shift-invariant measure p on S z . Consider the “past” of 
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the process: V = n^oo^) the projection of p, to V will be denoted 
by p~. Fix a point x~ = (x 0 , x_i, x_ 2 ,...) G V and consider the 
conditional distribution on the n-future given a fixed past x~ : 

P n (xi,x 2 , ■ ■ ■ ,x n \x~)] 

this is a measure on the n-future S n defined for almost all points x~ G 
V; it is an element of V ( S n ), thus we have a mapping F n : V —> V ( S n ) 
defined almost everywhere. 

Consider the Hamming metric on S n : 

h n (x, y) = G (1,..., n) : x t ± y<}, 

where x = (x\,...,x n ),y = (yi,..., y n ) G S n and # stands for the 
number of points in a set; and let kh n be the Kantorovich metric on 
the space V ( S n , h n ) of measures on the n-future. 

Theorem 3. [17, 24] Consider a stationary process {£ n } ; n G Z, and 
the right shift in the space of realizations generated by this process. An 
invertible encoding of this shift into a Bernoulli shift (in other words, a 
measure-preserving isomorphism of the shift in the space of realizations 
of the process and a Bernoulli shift) exists if and only if 

lim [[ k hn (P(*\x~),P(*\y~)) dy~(x~)d^(y~) = 0 

x~£ P,y~£V 

(the integral of the value of the Kantorovich metric for the pair of 
conditional measures corresponding to a pair of points from. V x V with 
respect to the product measure x p~). 

The literal meaning of the above condition is very transparent: it 
means that the conditional distribution on the future given a fixed 
past asymptotically does not depend on the past; roughly speaking, 
there is only one type of distribution on the future; but a more precise 
sense of these words essentially depends on the choice of a metric on 
the space of realizations of the process (we should take the Hamming 
metric) and a metric on the spaces of measures (here we should use 
the Kantorovich metric); in general, the conclusion of the theorem will 
be false if we replace the Kantorovich metric by some other one (for 
example, by the variation metric). 

The last formulation also motivates the definition of the so-called 
secondary entropy of a stationary process (see [24]). Define M+ as the 
image of the measure /i - (see above) under the mapping F n : V —> 
V(S n ,h n )] this is a measure on V(S n ,h n ). In the case of Bernoulli 
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automorphisms, by Ornstein’s theorem, the measure M+ tends to a 5- 
measure as n —> oo. But for a general Kolmogorov stationary process 
(K-automorphism), this is not the case. More precisely, if the auto¬ 
morphism is not a Bernoulli automorphism, then the limit exists, but 
is not a ^-measure. Thus it is natural to introduce a characteristic of 
the limiting measure. Namely, we may consider the so-called e-entropy 
of the measure M+. This notion also uses the Kantorovich metric. For 
an arbitrary Borel probability measure v on a metric space (X, d), the 
e-entropy h £ {y) (as a function of e) is defined as follows: 

heiy) = inf {if (Z) : k d (l,v) < e}, 

where the inhmum is taken over all discrete measures l on (.X , d) and 
H(l) is the ordinary entropy of a discrete measure: H(l) = log/*, 

l (b, . . . , ln)i 1) li 0, i 1,..., n. 

The asymptotic of h e (M+) with respect to n is called the secondary 
entropy of the process. An open problem: what kind of asymptotic 
behavior can appear? Presumably, the secondary entropy is a metric 
invariant of K-automorphisms. 

5. Application to the classification of metric spaces 

Consider a Polish (=metric, complete, separable) space with a Borel 
probability measure. We call such a space a metric triple (another 
term is an mm- space [7]). Two triples (X,p,p) and (X',p',p') are 
isomorphic if there exists a mapping T : X —» X' that is an isometry 
and preserves the measures: p'(Tx,Ty) = p(x,y ) and Tp = p!. 

We regard the metric as a measurable function of two variables: 

p:XxX —> R. 

(The theorem below is true for an arbitrary symmetric measurable 
function p, not necessarily a metric.) 

Let X°° be the product of infinitely many copies of the space X. 
Define a mapping 

F : X°° —♦ M 0O (M) 

from X°° to the set of symmetric matrices as follows: F(x, y) = {'Pj}f) =1 , 
where x — (xi, x 2 , —) and r, hJ = p(xi, Xj). 

Let us denote the image of the measure p°° under the mapping F 
by F(p) = D p ] the measure D p on ^^(R) will be called the matrix 
distribution of the function p. 

In [25], we considered and classified general (nonsymmetric) mea¬ 
surable functions f(x,y ) of two variables on the space (X x X, p x p) 
up to mappings of the form T\ x X 2 , where T\ and X 2 are measure¬ 
preserving automorphisms of (X,p). We also defined the notion of 
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matrix distribution for this case; it is a complete invariant for so-called 
pure functions. Bnt now we need another classification. We also con¬ 
sider arbitrary measurable (nonsymmetric) functions / on the space 
(.X x X, p x p), where (X, p) is a Lebesgue space with continuous mea¬ 
sure, but we classify them up to mappings of the form T x T, where 
T is an automorphism of (X,n) (in other words, T\ = Tf). Namely, 
define a mapping 

F f : X°° — AUR), 

where Ff(x) = {f(xi,Xj)}°° =1 and x = (xi, X 2 , ■ ■ ■) G A"°°; here 
M 00 (R) is the set of arbitrary (not necessarily symmetric) matrices. 
The Ff- image of the measure /ix/i, which is a measure on M 00 (M), is 
called the symmetric matrix distribution of the function f and denoted 
by D s f . 

Theorem 4. (Gromov [7], Vershik [25]) 

(1) Two metric triples (X , p, //) and (. X ', p', p') are isomorphic if and 
only if their matrix distributions coincide: 

D s — D s 

u p ~ u p" 

In other words, the matrix distribution of the metric is a complete 
invariant of a metric triple. 

(2) ( Vershik [25]). The symmetric matrix distribution D] of a mea¬ 
surable function /(•, •) of two variables is a complete metric invariant 
of the function regarded up to automorphisms of the form T xT, where 
T is an automorphism of (. X , /i). 


Now we apply this classification to MK-problems. Let X be a com¬ 
pact metric space with metric p; we want to “transport” a Borel prob¬ 
ability measure [i\ to another Borel probability measure /i 2 . Thus we 
have two metric triples: (X, p, pi) and (X !, p,/i 2 )- It is more conve¬ 
nient to reduce the problem to a more symmetric form and to have one 
metric triple. Let us consider only continuous measures; then we can 
choose a measure-preserving isomorphism S : (X,p 2 ) —> (X,pi). Let 
f(x,y) = p(x,Sy), so that / is a nonnegative measurable (in general, 
nonsymmetric) function of two variables - the “shifted metric.” Now 
we can consider only one measure /ii = // and the function / on the 
space (A" x X, p x p). 

In terms of the shifted metric, the MK-problem can be formulated 
as follows: to find 


k = inf J f(xi,X 2 )dL, 

where L runs over all Borel measures on the product X x X with both 
marginal measures equal to the measure p; thus L belongs to the set of 
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bistochastic measures, or, in other words, L is an element of the semi¬ 
group of polymorphisms with invariant continuous measure p (see [22]) 
for definitions). Thus the MK-problem turns into a variational prob¬ 
lem on the convex set of bistochastic measures (or on the semigroup of 
polymorphisms). 

The strong MK-problcm reads as follows: to find 
k = inf J f{x, Tx ) dp(x), 

where T runs over all p-preserving transformations of (X,p). In this 
case, we have a variational problem on the group of measure-preserving 
transformations. 

Now we can apply the above-defined symmetric matrix distribution 
Dj of the function / regarded as a measurable function (shifted metric) 
on the space (A" x X, p x p). Since Dj is a complete invariant of 
the triple ( X,f,p ), all properties of the (ordinary and strong) MK- 
problem can be expressed in terms of Dj as a measure on the space of 
matrices M^M). But this means that we have a random matrix with 
distribution Dj, which we can use for analysis of the problem. Here 
we describe only one example of applying this approach. 

Let r = {rij}°° = i be a random matrix with distribution Dj. The new 
version of the MK-problcm reads as follows. Choose a random matrix 
r, for each n consider the ordinary finite transportation problem, and 
define 

n 

k n (r) = inf ^ 

i,j =1 

where l = is a bistochastic matrix (i.e., hj = J2j=i h,j — 

1, kj > 0 for all i,j — 1,... , n) and r n = {r t .jY i]=x is the n-fragment of 
r (the random matrix constructed from the shifted metric as described 
above). Thus k n (r) is a random variable that depends on the random 
matrix r. 

Theorem 5. In the previous notation, 

lim k n {r) = k in measure Dj, 

n—>oc J 

where k is the solution of the original MK-problem, i.e., the sequence 
of random variables k n (r) converges in measure Dj to the solution of 
the MK-problem. 

A natural conjecture: for almost every choice of the matrix r = {rij} 
with respect to the measure Dj, the same assertion is true: 

Dj{r : lim k n (r) = k} = 1, 

J n—xx) 
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which means that k n (r ) converges to k with probability one with respect 
to the choice of the matrix r according to the measure Dj. 

Note that we approximate the MK-problem with the simplest finite¬ 
dimensional problem of linear programming - the allocation problem. 
By the Birkhoff-von Neumann theorem, the solution of this problem is 
a permutation, i.e., an element of the symmetric group, or an extreme 
point of the convex set of bistochastic matrices (the so-called Hungarian 
polytope). Nevertheless, the question of when the strong MK-problem 
has a solution and how it can be approximated by permutations is more 
involved. 

The theorem and conjecture given above are typical for applications 
of our method to various problems with integral kernel: we obtain 
a probabilistic approximation of a functional or variational problem 
using a random choice of values of the function. We will return to this 
elsewhere. 

Partially supported by the RFBR, project 02-01-00093, and the 
President of Russian Federation grant for support of leading scientific 
schools NSh-2251.2003.1. 

Translated by A. M. Vershik and N. V. Tsilevich. 

References 

[1] A. Barvinok, A Course in Convexity, Amer. Math. Soc., Providence, Rhode 
Island (2002). 

[2] Y. Brennier,”Extended Monge-Kantorovich theory”, Lecture Notes in Math., 
1813, 91-122 (2003). 

[3] M. Emery, “Espaces probabilises filtres: de la theorie de Vershik au mouvement 
brownien, via les idees de Tsirelson,” Seminaire BOURBAKI , No. 882 (2000). 

[4] U. Frish, Turbulence. The Legacy of A. N. Kolmogorov, Cambridge Univ. Press, 
Cambridge (1995). 

[5] W. Gangbo and R. J. McCann, “The geometry of optimal transportation,” 
Acta Math., 177, No. 2, 113-161 (1966). 

[6] M. L. Gavurin and L. V. Kantorovich, “Application of mathematical methods 
to problems of analysis of freight flows,” in: Problems of Raising the Efficiency 
of Transport Performance [in Russian], Moscow-Leningrad (1949), pp. 110- 
138. 

[7] M. Gromov, Metric Structures for Riemannian and Non-Riemannian Spaces, 
Birkhauser, Boston (1999). 

[8] L. V. Kantorovich, Mathematical Methods in the Organization and Planning 
of Production [in Russian], Leningrad (1939). 

[9] L. V. Kantorovich, “On an efficient method of solving some classes of extremal 
problems,” Dokl. Akad. Nauk SSSR, 28, No. 3, 212-215 (1940). 

[10] L. V. Kantorovich, “On the translocation of masses,” Dokl. Akad. Nauk SSSR, 
37, Nos. 7-8, 227-229 (1942). 



14 


A.VERSHIK 


[11] L. V. Kantorovich, “On a problem of Monge,” Uspekhi Mat. Nauk, 3, No. 2, 
225-226 (1948). 

[12] L. V. Kantorovich, “Functional analysis and applied mathematics,” Uspekhi 
Mat. Nauk, 3, No. 6, 89-185 (1948). 

[13] L. V. Kantorovich, Economical Calculation of the Best Use of Resources [in 
Russian], Moscow (1960). 

[14] L. V. Kantorovich, “On new approaches to computational methods and pro¬ 
cessing of observations,” Sib. Mat. Zhurn., 3 , No. 5, 701 709 (1962). 

[15] L. V. Kantorovich and G. Sh. Rubinshtein, “On a space of totally additive 
functions,” Vestn Lening. Univ., 13 , No. 7, 52-59 (1958). 

[16] Leonid Vitalievich Kantorovich: Man and Scientist, vol. 1, Novosibirsk (2002). 

[17] D. Ornstein, Ergodic Theory, Randomness, and Dynamical Systems, Yale Univ. 
Press, New Haven London (1974). 

[18] S. T. Rachev, Probability Metrics and the Stability of Stochastic Models, Wiley, 
Chichecter (1991). 

[19] A. M. Vershik, “Some remarks on infinite-dimensional problems of linear pro¬ 
gramming,” Uspekhi Mat. Nauk, 25 , No. 5, 117 124 (1970). 

[20] A. M. Vershik, “Decreasing sequences of measurable partitions and their ap¬ 
plications,” Sov. Math. Dokl, 11 , No. 4, 1007 1011 (1970). 

[21] A. M. Vershik, “On D. Ornstein’s papers, weak dependence conditions and 
classes of stationary measures,” Theory Probab. Appi, 21 (1977), 655-657. 

[22] A. M. Vershik, “Multivalued mappings with invariant measure (polymor¬ 
phisms) and Markov operators,” J. Sov. Math., 23 , 2243-2266 (1983). 

[23] A. M. Vershik, “Theory of decreasing sequences of measurable partitions,” 
St. Petersburg Math. J., 6, No. 4, 705-761 (1994). 

[24] A. M. Vershik, “Dynamic theory of growth in groups: entropy, boundaries, 
examples,” Russian Math. Surveys, 55 , No. 4, 667 733 (2000). 

[25] A. M. Vershik, “Classification of measurable functions of several arguments, 
and invariantly distributed random matrices,” Fund. Anal. Appl., 36 , No. 2, 
93-105 (2002). 

[26] A. M. Vershik, “About L. V. Kantorovich and linear programming,” in: 
Leonid Vitalievich Kantorovich: Man and Scientist, vol. 1, Novosibirsk (2002), 
pp. 130-152. 

[27] A. Vershik, “Polymorphims, Markov processes, quasi-similarity of K- 
automorphisms,” to appear in Discrete Contin. Dyn. Syst. 

[28] A. M. Vershik and M. M. Rubinov, “General duality theorem in linear pro¬ 
gramming,” in: Mathematical Economics and Functional Analysis [in Russian], 
Nauka, Moscow (1974), pp. 35-55. 

[29] C. Villani, Topics in Optimal Transportation, Arner. Math. Soc., Providence, 
Rhode Island (2000). 

[30] Optmal Transportation and Applications. Springer Lecture Notes in Mathe¬ 
matics. Edit.L.A.Cafarelli, S.Salsa, v. 1813 (2003). 

Mathematical Institute of Russian Ac.Sci. St.Petersburg branch, 

Fontanka 27, St.Petersburg, 191023, Russia. 

E-mail address : vershik@pdmi.ras.ru 



